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Abstract — This  paper  describe  our  approaches  to  real-time 
filtering  task  in  the  TREC  2015  Microblog  track,  including 
push  notifications  on  a  mobile  phone  task  and  periodic  email 
digest  task.  In  the  push  notifications  on  a  mobile  phone  task,  we 
apply  a  recommendation  framework  with  rank  algorithm  and 
dynamic  threshold  adjustment  which  utilizes  both  semantic 
content  and  quality  of  a  tweet.  External  information  extracted 
from  Google  search  engine  and  word2vec  model  based  on  exist¬ 
ing  corpus  are  well  incorporated  to  enhance  the  understanding 
of  a  tweet’s  or  a  profile’s  interest.  In  the  email  digest  task, 
based  on  the  candidate  tweets  retrieved  from  the  first  task, 
we  calculate  the  score  of  a  tweet  considering  semantic  features 
and  quality  features,  all  the  tweets  classified  into  a  topic  are 
ranked  by  our  key  word  bool  logistic  model. 

1.  Introduction 

Information  retrieval  and  recommendation  in  online  so¬ 
cial  network  has  attracted  increasing  attention  with  develop¬ 
ment  of  social  network  services.  To  explore  user’s  interests 
and  boost  retrieval  and  recommendation  performance  in 
real-time  environment,  TREC  first  introduced  real-time  task 
in  2011  [1],  which  is  addressing  a  real-time  adhoc  search 
task.  The  information  a  user  wishes  to  see  is  represented  by  a 
query,  systems  should  respond  to  a  query  by  providing  a  list 
of  relevant  tweets  ordered  by  time,  starting  from  the  query 
is  issued.  In  other  words,  systems  should  feed  users  with 
the  most  recently  and  relevant  tweets.  The  Microblog  Track 
in  2015  is  a  real-time  filtering  task,  the  goal  of  the  real¬ 
time  filtering  task  is  to  explore  technologies  for  monitoring 
a  stream  of  social  media  posts  with  respect  to  a  user’s 
interest  profile.  Different  from  a  typical  ad  hoc  query,  there 
is  not  an  actual  information  need.  Instead,  the  goal  is  for 
a  system  to  push  interesting  content  to  a  user.  The  notion 
of  what’s  interesting  is  considered  in  two  concrete  task 
models,  push  notification  on  a  mobile  phone  as  Scenario 
A  and  periodic  email  digest  as  Scenario  B.  In  Scenario 


A,  content  identified  as  interesting  by  a  system  based  on 
user’s  interest  profile  might  be  shown  to  the  user  through 
mobile  phone  notification.  Under  that  circumstances,  such 
notifications  should  be  triggered  a  relatively  short  time  after 
the  content  is  generated.  In  Scenario  B,  content  calculated 
as  interesting  by  a  system  based  on  user’s  interest  profile 
might  be  aggregated  into  an  email  form  that  periodically 
sent  to  a  user.  In  that  case,  a  user  could  read  a  longer  story 
about  the  contents. 

In  the  Scenario  A,  we  apply  a  recommendation  frame¬ 
work  with  rank  algorithm  and  dynamic  threshold  adjust¬ 
ment.  Semantic  features  and  quality  features  are  extracted 
to  achieve  good  retrieval  and  recommendation  performance 
in  social  media.  For  semantic  features,  we  utilize  different 
retrieval  models,  such  as  TFIDF,  BM25,  key  word  bool 
logic  model,  to  calculate  the  relevance  score  of  a  given 
profile  and  a  tweet.  In  order  to  enhance  the  performance 
of  semantic  features  and  ease  the  shortcomings  of  bag-of- 
words(BoW)  model,  we  take  advantage  of  word2vec  model 
[2]  [3]  [4]  based  on  existing  corpus,  such  as  Wikipedia, 
KnowItAll  [5],  Freebase  [6],  Probase  [7].  In  order  to  expand 
semantic  features  of  profiles,  we  also  use  Google  search 
engine  to  acquire  external  information.  We  use  abstract  text 
of  retrieval  results  to  better  understand  the  user’s  interests. 
For  the  quality  features,  we  utilize  several  quality  features 
extract  from  a  tweet,  such  as  the  user  who  post  the  tweet, 
the  number  of  repost,  the  number  of  comment,  the  number 
of  URL,  the  number  of  hashtags,  the  number  of  meaningful 
words,  the  length  of  a  tweet,  etc.  Topics  for  TREC  2013 
Microblog  track  [8]  are  used  for  model  training.  With  the 
artificial  labeled  data,  we  obtain  our  quality  model.  Final¬ 
ly,  we  combine  semantic  features  and  quality  features  to 
evaluate  a  tweet  comprehensively.  According  to  dynamic 
threshold  adjustment,  a  tweet  is  decided  to  push  or  not  by 
our  system. 

The  candidate  tweets  identified  in  Scenario  A  are  used  as 
the  input  source  of  the  task  in  Scenario  B.  We  calculate  the 
score  of  a  tweet  considering  semantic  features  and  quality 


features.  The  semantic  features  are  used  to  classified  a  tweet 
into  a  topic  or  drop  it  if  it  does  not  match  any  topics  and  the 
tweet  classihed  into  a  topic  will  get  a  semantic  score.  Then 
quality  features  are  utilized  to  evaluate  the  importance  and 
authority  of  a  tweet.  By  the  quality  model  we  obtained,  we 
could  get  a  quality  score  of  a  tweet.  With  a  rank  framework, 
the  tweets  classihed  into  a  same  topic  can  be  ranked.  The 
top  k  tweets  will  be  pushed  to  the  user  who  are  interested 
in  as  a  digest. 

The  remainder  of  the  paper  is  organized  as  follows, 
we  hrst  propose  our  approach  for  push  notihcations  on  a 
mobile  phone  task  in  Section  2.  In  Section  3,  we  describe 
our  system  for  periodic  email  digest  task  in  detail.  Section 
4  presents  our  experimental  results  and  analysis.  At  last  we 
conclude  our  paper  in  Section  5. 


2.  Push  Notification  on  a  mobile  phone  Task 


In  this  section,  we  hrst  introduce  our  system  architecture 
for  push  notihcations  on  a  mobile  phone  task.  Then,  the 
recommendation  framework  are  demonstrated  in  detail.  At 
last,  all  the  components  of  the  system  are  presented. 
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Figure  1.  System  Architecture  Framework 


2.1.  System  Overview 

It  is  a  real-time  job  in  this  year’s  Microblog  track  that 
teams  listen  to  the  twitter  stream  [9]  via  official  common 
API.  In  this  section,  we  briehy  discuss  the  architecture  of 
our  system,  which  is  shown  in  Figure.  1.  As  depicted  in 
the  hgure,  we  can  see  our  system  mainly  contains  four 
components  as  follow, 

1)  Feature  Extraction  Component,  which  extract 
features  from  twitter  stream  based  on  TREC-API' 
and  prohles  provided  by  the  official.  Before  feature 
extracting,  data  preprocessing  and  data  hltering  is 
implemented  to  get  rid  of  unnecessary  data.  For 
twitter  stream,  we  extract  semantic  features  and 
social  attributes.  For  prohles,  we  extract  key  words 
as  our  basic  features. 

2)  Feature  Representation  Component,  which  rep¬ 
resents  and  expands  semantic  features  by  several 
techniques.  Information  of  a  tweet  and  prohles  are 
enriched  by  this  component. 

3)  Candidate  Generation  Component,  which  classi- 
hes  tweets  to  the  optimal  prohle  based  on  semantic 
features  and  quality  features  by  a  key  word  bool 
logic  model. 

4)  Scoring  and  Pushing  Component,  which  ranks 
tweets  candidates  in  different  prohles  with  the  hnal 
score  and  makes  threshold  adjustment  based  on 
historical  data  over  time. 


1 .  https://github.com/lintool/twitter-tools 


2.2.  Feature  Extraction  Component 

Twitter  stream  we  listen  to  is  during  the  evaluation  time 
according  to  the  official^,  and  it  lasts  ten  days.  After  ob¬ 
taining  twitter  stream,  we  adopt  preprocessing  and  hltering 
to  reduce  the  tweets  we  need  to  process.  The  preprocessing 
and  hltering  utilized  on  tweet  stream  are  as  follow, 

•  Non-English  Filtering,  we  abandoned  the  non- 
English  tweets  by  a  language  detector  with  inhnity 
gram,  named  Idig  [10].  This  tool  kit  is  a  prototype 
for  short  message  service  with  99.1%  accuracy  for 
17  languages^.  By  the  way,  we  also  use  a  method 
based  on  encoding  set  of  characters  to  process  tweets 
consist  of  both  English  characters  and  non-English 
characters.  We  only  keep  the  tweet  in  which  English 
characters  is  the  vast  majority  with  a  threshold  value. 

•  Redundant  Retweet  Elimination,  we  only  keep  one 
tweet  and  eliminate  other  tweets  retweeted  the  same 
tweet  by  the  retweet  id  information  according  to 
official  requirements. 

Then  semantic  features  and  social  attributes  are  extracted 
from  tweets.  For  semantic  features,  we  selected  nouns  and 
verbs  in  tweet  text.  So  semantic  features  of  a  tweet  is 
represented  as  Equation.  1, 

T  =  {fl,  f2,  ■■■,  tn}  (1) 

T  represents  a  tweet  and  ti  stands  for  a  key  word  in  tweet 
text.  The  social  attributes  are  extracted  from  structured  data 

2.  https://github.eom/lintool/twitter-tools/wiki/TREC-2015-Track- 
Guidelines 

3.  https://github.com/shuyo/ldig 


in  a  tweet.  A  tweet  is  structured  as  JSON  format,  it  is 
convenient  to  get  social  attributes  we  need,  such  as  the  user 
who  post  the  tweet,  the  number  of  repost,  the  number  of 
comment,  the  number  of  URL,  the  number  of  hashtags,  the 
number  of  meaningful  words,  the  length  of  a  tweet,  etc. 

For  profiles,  we  extract  the  nouns  and  verbs  from  title, 
desc  and  narr  field.  We  use  a  key  word  bool  logic  model 
to  express  the  information  of  a  profile  as  follow, 

P  =  {tid  :  XXX,  keyword  :  {0  :  pi\\p2, 1  :  P3&&P4}}  (2) 

P  represents  a  profile,  tid  stands  for  a  topic  id  of  a  profile. 
The  keyword  field  contains  two  fields,  0  for  words  that 
unnecessary  but  could  increase  the  semantic  score  and  1 
for  words  that  need  to  be  included.  Symbol  ||  means  or 
logic  and  symbol  &&  stands  for  and  logic.  So  it  means  ps 
and  p4  need  to  be  included  and  pi  or  p2  is  optional  for  the 
profile  of  which  topic  id  is  xxx.  In  this  section,  we  extract 
the  features  and  store  them  by  format. 

2.3.  Feature  Representation  Component 

After  extracting  the  semantic  features,  we  need  to  rep¬ 
resent  those  features  in  a  proper  format  so  that  it  is  conve¬ 
nient  to  calculate  the  relevance  between  tweets  and  profiles. 
For  profiles,  the  key  words  extract  from  the  files  offered 
by  the  official  is  not  enough  to  improve  the  performance 
because  short  text  retrieval  suffers  severely  from  vocabulary 
mismatch  problem.  Terms  overlapping  between  profiles  and 
tweets  are  relatively  small.  Semantic  expansion  methods  can 
be  leveraged  to  enhance  the  retrieval  performance.  In  this 
section,  we  introduce  several  semantic  expansion  methods 
to  boost  the  performance. 

There  are  two  kinds  of  semantic  expansion  methods, 
knowledge  repository  based  and  search  engine  based.  For 
profiles,  we  use  Google  search  engine  API  to  expand  infor¬ 
mation  about  the  profiles.  The  title  field  is  used  as  a  query 
for  searching  and  the  abstract  text  information  of  top  50 
retrieval  results  are  collected  for  each  profile.  Abstract  text 
is  treated  as  a  document,  each  document  contains  several 
terms.  After  gathering  all  the  documents,  we  use  TFIDF 
algorithm  to  calculate  TFIDF  value  of  each  term  for  all  the 
profiles.  The  top  k  terms  of  each  profiles  are  added  to  key 
word  table  in  Equation.2  to  expand  the  information. 

Due  to  the  vocabulary  mismatch  problem,  vector  model 
is  utilized  to  process  the  semantic  features.  The  word2vec 
technique  is  used  to  vectorization  for  the  key  words  and 
gensini^  tool  is  used  in  this  paper.  The  training  corpus  we 
used  is  acquired  from  wikipedia  English  corpus.  A  word2vec 
knowledge  base  are  trained  by  gensim  tool  using  wikipedia 
English  corpus.  Tweets  and  Profiles  can  be  represented  by 
word2vec  knowledge  base  as  follow, 

Tvec  =  (3) 


In  Equation. 3,  n  is  the  dimensions  in  gensim  tool,  generally 
set  to  200  or  400.  The  profiles  can  be  demonstrated  as  a 
matrix  as  follow. 


P„ 


Pii 


Pml 


Pin 


Pmn  J 


(4) 


In  Equation.4,  n  is  same  as  in  Equation. 3  and  m  stands  for 
the  number  of  profiles.  A  row  {pn,  ■■■,Pin)  in  the  matrix 
stands  for  the  normalized  center  vector  of  a  profile  by  all  the 
key  words.  After  the  procedure  above,  the  semantic  features 
of  tweets  and  profiles  are  well  represented. 


2.4.  Candidate  Generation  Component 


In  this  section,  we  classify  tweets  into  the  most  rele¬ 
vant  profile  or  drop  it  directly  if  it  does  not  match  any 
profile  and  generate  candidates  based  on  semantic  features 
in  section  2.3.  Eirstly  semantic  features  are  utilized  based 
on  Equation. 3  and  Equation.4  as  follow. 
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Then,  the  profile  which  has  the  maximum  value  and  the 
terms  in  tweet  satisfy  the  bool  logic  in  Equation.2  will  be 
choose  as  candidate.  The  semantic  score  Ci  is  recorded  si¬ 
multaneously.  We  used  two  kinds  semantic  score  to  evaluate 
the  relevance  between  tweets  and  profiles  as  follow. 


•  TFIDF  Score,  which  calculates  the  cosine  similarity 
between  a  tweet  and  a  profile  in  vector  space  model 
with  TFIDF  weight  of  terms.  Vector  space  model  is 
a  model  which  represents  a  document  as  a  vector. 
Tweets  and  profiles  can  be  expressed  as  vectors, 

f  =  (6) 

P  =  {Pl,P2,-,Pn)  (7) 


The  TFIDF  method  use  term  weight  and  cosine 
similarity  metric  to  evaluate  the  relevance  between 
a  tweet  and  a  profile.  Cosine  similarity  metric  is 
defined  as  follow. 


Sim  = 


f  ■  P 


(8) 


•  BM25  Score,  which  utilizes  the  Okapi  BM25 
weighting  function  to  measure  the  semantic  rele¬ 
vance  between  a  tweet  and  a  profile.  Okapi  BM25 
model  is  a  bag  of  words  model  that  rank  documents 
based  on  the  query  terms  appearing  in  each  doc¬ 
uments.  The  similarity  between  a  document  and  a 
query  is  defined  as  Equation.9,  where  D  represents 
a  document,  Q  stands  for  a  query.  f{qi,D)  is  qds 
term  frequency  in  document  D,  \D\  is  the  length 
of  the  document  D  in  words,  avgdl  is  the  average 


4.  http://radimrehurek.com/gensim/index.html 


document  length  of  all  the  documents  to  process  and 
fci  and  b  is  adjustable  parameters. 


\D\  ] 

avgdl  / 

(9) 

Social  attributes  extracted  in  section  2.1  are  used  to 
train  quality  model.  We  label  a  tweet  with  a  score  from 
0  to  1  artificially  based  on  its  quality.  If  the  tweet  provides 
more  information  and  written  more  elaborately,  it  will  get 
higher  quality  score.  The  model  we  use  is  logistic  regression 
model  in  machine  learning  tool  weka^.  Then,  the  semantic 
score  and  quality  score  are  utilized  to  evaluate  the  relevance 
and  quality  of  a  tweet  for  a  certain  profile.  Based  on  the 
assumption  that  users  prefer  those  tweets  related  to  the 
profile  and  popular  in  social  media,  we  consider  social 
attributes  as  follow, 

•  User  follower  CovmiiFollowerCnf),  which  repre¬ 
sents  the  number  of  followers  of  the  user  who  post 
the  tweet.  The  user  whose  followers  count  is  high 
would  be  a  famous  user  in  social  media  and  will 
post  high  quality  tweet  with  a  large  probability. 

•  User  status  CovmiiStatusCnf),  which  represents  the 
number  of  status  of  the  user  who  post  the  tweet.  The 
status  count  indicates  the  vitality  of  a  user  in  social 
media.  A  energetic  user  will  post  higher  quality 
tweet  than  others. 

•  Retweet  CovmiiRetweetCnf),  which  represents  the 
times  a  tweet  is  retweeted.  The  larger  retweet  count 
is,  the  more  popular  a  tweet  is  in  social  media. 

•  Retweet  Uevel(/JefweefLvO,  we  use  logarithm  to 
measure  retweet  count  to  retweet  level. 

•  Collect  Co\mi{CollectCnf),  which  represents  the 
number  of  people  who  like  it.  People  can  collect 
a  tweet  or  star  a  tweet  if  the  tweet  is  attractive. 

•  Word  Co\mi{WordCnf),  which  calculates  the  num¬ 
ber  of  words  in  a  tweet  without  stop  words.  Gen¬ 
erally,  informative  and  high  quality  tweets  may  be 
longer  than  others. 

•  Character  Covmt{CharCnt),  which  calculates  the 
number  of  characters  of  a  tweet  without  stop  words. 

•  Short  Url  Co\int{UrlCnt),  which  represents  the 
number  of  short  url  count  of  a  tweet.  Informative 
tweet  and  news  will  give  a  short  url  at  the  end  of  a 
tweet  in  general. 

2.5.  Scoring  and  Pushing  Component 

By  the  semantic  features  and  social  attributes,  we  got 
two  score,  the  semantic  score  Ci  in  Equation.5  and  the 
quality  score  qi.  Both  value  of  them  are  from  0  to  1.  The 
finally  score  we  measured  for  a  tweet  to  a  profile  is  as 
follow,  where  Si  stands  for  the  final  score. 

Si=Ci-  qi  (10) 

5.  http://www.cs.waikato.ac.nz/ml/weka/ 


Sim  =  ^  IDF{qi) 

qi&Q 


■  (fci  -f  1) 


f{q^,D)  -b  fci  •  (1  -  5-f  6  ■ 


When  a  candidate  is  added  to  the  pushing  queue,  it  is  ranked 
by  the  final  score  Si.  If  a  tweet  is  relevant  and  important 
to  a  profile,  it  is  necessary  to  push  it  to  the  users  who 
are  interested  in.  But  there  is  a  limit  in  Scenario  A  that 
ten  tweets  could  be  pushed  to  a  profile  at  most  in  one  day 
and  the  gain  will  decrease  over  time.  So  it  is  a  constraint 
satisfaction  problem  we  need  to  handle.  We  used  a  dynamic 
threshold  adjustment  to  make  sure  there  are  enough  tweets 
for  a  profile  and  each  tweet  with  a  high  score  during  one  day. 
With  a  recently  historical  data  of  the  tweets  for  a  profile,  we 
can  get  the  highest  final  score  Smax-  We  make  a  piecewise 
function  for  the  threshold  as  Equation.il, 


threshold  = 


(0.9  -  d)  ■  Smax  d  <  0.4 
0.5  •  Smax  d  >  0.4 


(11) 


where  d  stands  for  decay  value  and  d  =  c  -  floor{t/2).  c  is 
decay  coefficient  which  we  set  to  0.05  in  our  system,  t  is  the 
hour  in  a  day  from  0  to  24.  If  a  tweet’s  final  score  Si  exceed 
the  threshold  at  that  time,  it  will  be  pushed  immediately. 

As  described  above,  the  live  push  algorithm  based  on 
semantic  features  and  social  attributes  are  summarized  in 
Algorithm.!,  the  program  won’t  stop  until  R  is  full  for  each 
profile  with  10  tweets  or  Ts  is  exhausted  in  a  day.  threshold 
will  automatically  adjust  over  time. 


Algorithm  1  Live  Push  Algorithm 

Require: 

Twitter  stream  Ts  =  {tsi,ts2,  ...,fsfe} 

Profile  document  set  P  =  {Pi,  P2, Pm] 

Ensure: 

Retrieval  Set  R  =  {i?i,  i?2,  Rm]  for  each  profile  Pi 
is  full  or  Ts  =  0 
1:  Pmat  =  matrix(T) 

2:  while  R  is  not  full  and  time  is  not  up  do 

3:  Ti=pop{Ts) 

4:  preprocess(fi) 

5:  T„ec  =  vectorization(fi) 

6-  C  —  Pmat  '  Pyec 

7:  Cj  =  max(C) 

8:  Sj  =  Cj  ■  qj 

9:  if  Sj  >  threshold  then 

10:  Rj  =  Rj  U  tSi 

11:  end  if 

12:  end  while 


3.  Periodic  Email  Digest  Task 

In  periodic  email  digest  task,  we  need  to  collect  a  batch 
of  up  to  top  100  interesting  tweets  for  each  profile  during 
one  day  and  deliver  those  information  to  the  particular 
profile  after  the  day  ends.  It  is  expected  that  the  system 
will  complete  that  mission  in  a  relatively  short  amount  of 
time.  The  system  framework  used  in  scenario  B  is  same 
as  in  scenario  A  as  Eigure.l  ,  except  threshold  adjustment 
component.  All  the  tweets  are  classified  into  one  profile  or 
drop  it  if  it  does  not  match  any  profile,  then  the  candidates 


are  ranked  by  final  score  s  based  on  semantic  features  and 
social  attributes. 

To  supply  diverse  information  for  a  particular  profile, 
we  utilized  two  kinds  of  techniques  to  eliminate  redundant 
tweets. 

•  Redundancy  Removal  based  on  Id,  which  utilized 
the  tweet’s  id  to  identify  a  tweet.  If  a  tweet  is 
original,  we  record  the  id  of  original  tweet.  If  it  is 
a  tweet  reposting  another  tweet,  we  record  the  id  of 
the  reposted  tweet’s  id.  It  could  decrease  the  tweet 
reposting  a  popular  tweet. 

•  Simhash  [11]  [12],  which  is  a  popular  method  to 
handle  web  page  redundancy.  It  turns  a  document 
into  a  fingerprint,  called  simhash  code.  The  closer 
hamming  distances  between  two  documents  is,  the 
more  similar  they  are.  The  simhash  code  is  calculat¬ 
ed  as  follow, 

ETL 

Wi  ■  Ci)  (12) 

i—1 

where  Wi  is  the  weight  of  term  i  and  Ci  is  the  hash 
code  of  term  i,  sign  is  symbol  function  that  make 
positive  to  1  and  negative  to  0  for  every  bit  in  the 
code. 

Our  daily  retrieval  algorithm  can  be  described  as 
Algorithm.2 


Algorithm  2  Daily  Retrieval  Algorithm 

Require: 

Twitter  retrieval  set  Tr  =  {Tri,  Tr2, ...,  Tr^}  based 
on  scenario  A 

Ensure: 

Daily  retrieval  Set  Dr  =  {Dri,  Dr^, ...,  Dr^} 

1:  for  Tvi  e  Tr  do 

2:  while  \Dri\  <  N  and  Tri  7^  0  do 

3:  tmax  =  max(Trj) 

4:  Tr  i  =  Tr  i  imax 

5:  if  tmax  not  in  Dri  then 

6:  Dri  —  T)ri  U  tmax 

7:  end  if 

8:  end  while 

9:  end  for 


where  Tr  is  daily  candidates  for  m  profiles  acquired 
in  scenario  A.  For  each  profile,  we  iteratively  get  the  most 
interesting  tweet  from  candidate  set  and  drop  the  redundant 
tweet.  At  last,  we  get  the  daily  retrieval  set  Dr. 

4.  Result  and  Analysis 

The  evaluation  of  TREC  2015  Microblog  track  lasts  10 
days  from  Monday,  July  20,  2015,  00:00:00  UTC  to  July 
29,  2015,  23:59:59  UTC.  It  consists  of  225  interest  profiles, 
which  the  participants  will  be  responsible  for  tracking. 
During  the  evaluation  time,  participants  will  listen  to  tweet 
stream  continuously  and  deal  with  every  tweet.  After  the 
evaluation  period,  based  on  post  hoc  analysis,  NIST  will 


TABLE  1.  Results  in  scenario  A 


ELG 

nCG 

SNACSA 

0.3086 

0.3349 

SNACS_LA 

0.2863 

0.2974 

summaryA 

0.4623 

0.4846 

TABLE  2.  Results  in  scenario  B 


nDCG 

SNACS 

0.3345 

SNACS_LB 

0.3670 

summaryB 

0.5014 

select  a  set  of  approximately  50  topics  that  will  actually  be 
assessed. 

There  are  some  metrics  to  evaluate  the  performance  of 
a  system.  In  scenario  A,  the  first  metric  is  expected  latency- 
discounted  gain  (ELG)  from  the  temporal  summarization 
track,  the  ELG  score  is  depicted  as  Equation.  13 

ELG  =  {l/\Tr\)-Y,9airi{Tri)  (13) 

i 

where  Tr  is  the  returned  tweet  sets,  gain{)  is  the  score  func¬ 
tion  for  a  tweet.  Not  interesting,  spam/junk  tweets  receive  a 
gain  of  0,  somewhat  interesting  tweets  receive  a  gain  of  0.5, 
very  interesting  tweets  receive  a  gain  of  1.0.  In  addition,  a 
latency  penalty  is  applied  to  all  tweets,  the  latency  penalty  is 
computed  as  max  (0,  (100  —  delay)  /lOO),  where  the  delay 
is  the  time  elapsed(in  minutes,  rounded  down)  between 
the  tweet  creation  time  and  the  putative  time  the  tweet  is 
delivered.  The  secondary  metric  is  normalized  cumulative 
gain  (nCG),  which  is  depicted  as  Equation.  14 

nCG  =  (1/^)  ■  gainiTrj)  (14) 

i 

where  Z  is  the  maximum  possible  gain  (given  the  10  tweets 
per  day  limit). 

In  scenario  B,  for  each  topic,  the  list  of  tweets  returned 
per  day  will  be  treated  as  a  ranked  list  and  from  this  nD- 
CG@k  will  be  computed.  The  score  of  a  topic  is  the  average 
of  the  nDCG@k  scores  across  all  days  in  the  evaluation 
period.  The  score  of  the  run  is  the  average  over  all  topics. 

The  results  our  system  get  is  listed  in  Table.  1  and 
Table.2, 

SNAGS  A  and  SNAGS  are  the  results  pair  that  only 
use  the  words  in  tweets  to  generate  profiles  in  Equation.2. 
SN AGS_LA  and  SN AGS_LB  are  the  results  pair  that 
using  search  engines  to  expand  to  generate  profiles.  The 
summary  A  and  summaryB  is  the  average  score  of  the 
highest  score  of  every  topics.  Non-expand  algorithm  gets 
higher  ELG  and  nCG,  however  expand  algorithm  gets  higher 
nDCG. 

Eigure.2  is  the  ELG  vs.  nCG  pair  of  participants’  runs, 
Eigure.3  is  the  ELG  distribution  in  different  topics,  Eigure.4 
is  the  nCG  distribution  in  different  topics  and  Eigure.5  is  the 
nDCG  distribution  in  different  topics.  We  can  see  our  system 
is  close  to  the  max  results  in  summaryA  and  summaryB 


among  most  topics,  our  algorithm  is  verified  to  be  effective 
and  efficient. 
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Figure  4.  nCG  distribution  in  different  topics 


Figure  2.  ELG  vs.  nCG 


Figure  3.  ELG  distribution  in  different  topics 


5.  Conclusion 

In  this  paper,  we  present  our  system  architecture  frame¬ 
work  and  algorithms  for  TREC  2015  Microblog  track.  In 
the  push  notification  on  a  mobile  phone  task,  we  apply  a 
recommendation  framework  with  rank  algorithm  and  dy¬ 
namic  threshold  adjustment  which  utilize  not  only  semantic 
features  but  also  social  attributes  in  social  media.  In  periodic 
email  digest  task,  we  calculate  the  score  of  a  tweet  consider¬ 
ing  semantic  features  and  quality  features,  then  we  rank  the 
tweets  take  the  redundance  into  consideration.  Experimental 
results  show  our  effectiveness  and  efficiency  of  our  system 
in  both  tasks. 


Figure  5.  nDCG  distribution  in  different  topics 
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